Skip to content

fix(infra): replace httpx with aiohttp + requests for long-running asyncio stability#17

Open
GrigoryEvko wants to merge 808 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/aiohttp-network-layer
Open

fix(infra): replace httpx with aiohttp + requests for long-running asyncio stability#17
GrigoryEvko wants to merge 808 commits into
FusionBrainLab:mainfrom
GrigoryEvko:fix/aiohttp-network-layer

Conversation

@GrigoryEvko

@GrigoryEvko GrigoryEvko commented May 15, 2026

Copy link
Copy Markdown

Closes #9.

Replaces direct httpx usage in this codebase with a two-stack network layer: aiohttp under the openai SDK for async paths, requests + urllib3 for sync paths. Drops import httpx from gigaevo/, problems/, and tests/ — httpx remains a transitive dependency through openai / langchain-openai / langfuse / litellm but is no longer a surface this codebase directly maintains. The migration targets the failure mode the issue documents: long-running asyncio processes accumulating connections that httpcore's async semaphore never releases, producing silent PoolTimeout cascades on the timescale of hours.

What landed

New gigaevo/infra/ modules

  • aiohttp_factory.pymake_aiohttp_session(role, ...) returns aiohttp.ClientSession; make_openai_http_client(role, *, proxy, **overrides) returns openai.DefaultAioHttpClient, the aiohttp-backed http_client adapter shipped in the openai[aiohttp] extra. A _KeepaliveConnector subclasses TCPConnector and applies DEFAULT_SOCKET_OPTIONS to each transport after _wrap_create_connection (aiohttp doesn't expose socket_options directly).
  • requests_factory.pymake_requests_session(role, ...) returns a requests.Session with _KeepaliveHTTPAdapter mounted on both http:// and https://. The adapter forwards socket_options and ssl_context through init_poolmanager to urllib3.PoolManager so every minted connection inherits the keepalive settings. _TimeoutSession injects a default per-request timeout split into (connect, read) because requests has no session-level timeout knob.
  • _net.py — shared DEFAULT_SOCKET_OPTIONS (TCP_NODELAY + SO_KEEPALIVE + per-OS probe timing; Linux drops dead-peer detection from net.ipv4.tcp_keepalive_time=7200 to ~120 seconds via TCP_KEEPIDLE=60s / KEEPINTVL=10s / KEEPCNT=6) and build_tls_context() (TLS 1.2+ minimum, hostname + CA verification on).

Defaults targeting the bug profile

  • Bounded ClientTimeout(total=300s, connect=15s, sock_read=300s, sock_connect=15s) — no None pool waits, LLM-friendly read window.
  • keepalive_timeout=20s — shorter than common LB idle thresholds (AWS ALB 60s, NGINX 75s, CloudFlare 90s) so connections age out before the LB closes them asymmetrically.
  • limit_per_host=100 (was effectively unbounded on the old direct-httpx usage).
  • urllib3 Retry(total=5, connect=5, read=5, status=5, backoff_factor=0.5, backoff_jitter=0.5, backoff_max=30s). POST is excluded from allowed_methods (urllib3's safe default) — retrying a successfully-processed POST creates a duplicate resource. status_forcelist covers 408 / 425 / 429 / 5xx; respect_retry_after_header=True.

Call site migration

  • problems/prompts/client.py, problems/chains/client.py — the LLM clients now construct AsyncOpenAI with http_client=make_openai_http_client(...). The problems/chains/client.py timeout=None (which let the pool wait indefinitely — the silent-forever-hang from the issue) is removed; the factory provides bounded defaults.
  • gigaevo/memory/shared_memory/concept_api.py_ConceptApiClient switches from sync httpx.Client to a requests.Session from the factory. Sync httpx isn't subject to the documented async-pool bug, but removing it consolidates the codebase's network footprint.
  • gigaevo/infra/balanced_chat.pyBalancedChatOpenAI gains an explicit http_async_client kwarg, forwarded to every per-endpoint ChatOpenAI via kwargs.setdefault. Default None preserves prior behavior; callers opt in for shared aiohttp pool semantics across all endpoints.
  • gigaevo/memory/ideas_tracker/components/fabrics/llm_clients_fabric.py — the AsyncOpenAI client now passes http_client=make_openai_http_client("ideas_tracker_async"). The sibling sync OpenAI is left on the openai SDK default.
  • gigaevo/llm/models.pyMultiModelRouter._verify_models swaps urllib.request.urlopen for a make_requests_session("model_verify", timeout=(5.0, 10.0)) session. The probe now carries the same retry / keepalive / TLS settings as the rest of the sync layer.

Tests

  • tests/infra/test_aiohttp_factory.py (24 tests) — connector defaults, enable_cleanup_closed passed regardless of platform, TLS context pins TLSv1.2 + check_hostname + CERT_REQUIRED, socket options shape on Linux, factory delegation to DefaultAioHttpClient.
  • tests/infra/test_requests_factory.py (22 tests) — POST excluded from retry allowed_methods, status_forcelist contains 408/429/503, _TimeoutSession.request forwards extra kwargs verbatim, adapter pushes socket_options and ssl_context to PoolManager, default-timeout tuple normalization.
  • tests/infra/test_balanced_chat.py — new TestHttpAsyncClientForwarding covers explicit-kwarg propagation to every endpoint and the no-kwarg-no-injection default.
  • tests/memory/_fake_http.py — replaces httpx.MockTransport with a requests.Session.request patch keeping the previous handler interface (request.method, request.url, request.content) so the existing memory test handlers continue working unchanged.

Dependency changes

  • openai>=1.0.0openai[aiohttp]>=2.0.0. DefaultAioHttpClient requires the aiohttp extra in openai 2.x.
  • aiohttp>=3.10 and requests>=2.31 added explicitly — both are now first-party imports in this codebase.
  • Direct httpx>=0.27.0 dropped. Stays available transitively.

Deliberately not in this PR

  • Per-library transport patchers for langfuse / langsmith / litellm internals. Those libraries use their own httpx clients for telemetry; monkey-patching them carries upgrade risk and they aren't the hot path the issue documents. A separate effort if needed.
  • problems/dashboard/validate.py and gigaevo/memory/openai_inference.py — both use sync OpenAI. Sync httpx isn't the documented bug surface and the openai SDK's sync interface is built around httpx.Client; routing those through anything else requires architectural changes out of scope.
  • Replacing sync urllib.request usage outside the LLM router (gigaevo/utils/..., tools/...). Those are admin/CLI paths, not the hot path.
  • Hydra config updates to start passing http_async_client to BalancedChatOpenAI. The kwarg is added; wiring callers is a follow-up.

ShiftingBorders and others added 30 commits April 1, 2026 16:00
KhrulkovV and others added 20 commits April 6, 2026 13:59
refactor(memory): split card_conversion into focused modules
Define MemoryError, MemoryRetrieverError, MemorySearchError, and
MemoryStorageError in gigaevo/exceptions.py following the existing
GigaEvoError hierarchy.

Wire them into the memory subsystem:
- gam_search.build() wraps all failures in MemoryRetrieverError
- memory.py narrows two gam.build() catches from bare Exception
- card_store._load() narrows to (json.JSONDecodeError, OSError)
- card_dedup import block narrows to (ImportError, OSError)

Resilience-critical catches (search fallback, merge loop, __exit__)
remain broad by design.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor(memory): custom exception hierarchy and narrowed catches
…t base to ABC

- concept_api.py: all 5 RuntimeError raises → MemoryStorageError
  (matches gigaevo/database pattern of wrapping I/O errors)
- base.py: GigaEvoMemoryBase now uses ABC + @AbstractMethod
  (matches MutationOperator, Stage, LangGraphAgent pattern)
- card_dedup.py: narrow two broad catches:
  - JSONL read fallback: except Exception → (json.JSONDecodeError, OSError)
  - GAM store build: except Exception → (MemoryRetrieverError, OSError)
- Update 6 test assertions from RuntimeError to MemoryStorageError

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ormity

refactor(memory): exception conformity + ABC base class
When write_pipeline.py passes MemoryCard/ProgramCard Pydantic models to
memory_platform.save_card(), the dict() call on a Pydantic model doesn't
properly flatten nested Pydantic objects like ConnectedIdea. This caused
TypeError in _persist_index() when json.dumps() tried to serialize.

Root cause: write_pipeline returns list[AnyCard] (Pydantic models) and both
backends (memory_platform and memory/shared_memory) consume these cards via
save_card(). memory_platform's normalize_memory_card() must explicitly call
.model_dump() on Pydantic inputs to flatten nested objects.

Fix verified: all 788 memory + integration tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests the exact bug path: Pydantic MemoryCard/ProgramCard with nested
ConnectedIdea and MemoryCardExplanation objects must be properly
flattened to plain dicts before JSON serialization.

6 tests covering: ProgramCard with ConnectedIdea, MemoryCard with
MemoryCardExplanation, plain dict passthrough, JSON round-trips, None.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add gigaevo-memory Git dependency to pyproject.toml
- Remove sys.path manipulation from memory_platform/memory.py and
  remote_gam_retriever.py (no longer needed with proper install)
- Simplify test file to use direct imports instead of module mocking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expands from 6 to 11 tests covering the complete save_card → _persist_index
flow with Pydantic inputs. Tests verify:

- normalize_memory_card: ConnectedIdea/MemoryCardExplanation → dict
- save_card: Pydantic ProgramCard/MemoryCard → JSON-serializable index
- _card_to_backend_content: API payload is clean dict
- persist/reload roundtrip: index file survives write→read cycle

Uses _make_platform_memory() factory with mocked API client to test
memory_platform in isolation without network dependencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add docstrings to 15 public methods across 5 files (memory.py,
  concept_api.py, card_dedup.py, openai_inference.py, write_pipeline.py)
- Add return type annotations to 4 functions in amem_gam_retriever.py
- Fix 2 mypy errors: annotate retrievers dict, rename variable in api_sync.py
- Extract magic numbers: _MAX_SUMMARY_CHARS, _MAX_DESCRIPTION_CHARS,
  _ENTITY_NAME_MAX_LENGTH, _MAX_CONNECTED_DESCRIPTIONS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
refactor(memory): type annotations, docstrings, constants, platform bug fix
New gigaevo/infra/aiohttp_factory.py exposes two factories:

- make_aiohttp_session(role, ...) returns aiohttp.ClientSession with a
  hardened TCPConnector (limit=300, limit_per_host=50,
  keepalive_timeout=30s, enable_cleanup_closed=True) and a bounded
  ClientTimeout (total=120s, connect=10s, sock_read=120s,
  sock_connect=10s). No component defaults to None — the
  silent-forever-hang on the pool timeout is structurally unreachable.

- make_openai_http_client(role, ...) returns
  openai.DefaultAioHttpClient, the aiohttp-backed http_client adapter
  shipped in the openai[aiohttp] extra. Surfaces a clear ImportError
  with install instructions when the extra is missing.

The ``role`` argument is informational (memory_api, llm_prompts, ...).
Currently unused beyond receipt; reserved for per-role override hooks.

Build helpers (build_connector / build_timeout) are exported for callers
that need to customize one knob while keeping the rest of the defaults.

14 unit tests covering: hardened defaults, override hatches, trust_env
behavior, proxy forwarding through to DefaultAioHttpClient, and
ImportError messaging.
problems/prompts/client.py was constructing a bare httpx.AsyncClient(proxy=...)
with no limits, no keepalive_expiry, and the openai-default 5s total timeout.

problems/chains/client.py was constructing httpx.AsyncClient(timeout=httpx.Timeout(timeout=None, connect=30.0)) —
the timeout=None covers read/write/pool, which means the pool can wait
indefinitely for a connection slot. That is the silent forever-hang failure
shape reported in upstream encode/httpx discussion threads about long-running
asyncio processes accumulating leaked connections from cancelled tasks.

Both modules now build their http_client via the centralized aiohttp factory
(make_openai_http_client). The factory returns openai.DefaultAioHttpClient
which speaks the httpx-compatible interface the openai SDK expects but
routes traffic through aiohttp's connection pool — pool semantics that
survive multi-hour asyncio loads. Bounded ClientTimeout (total=120s,
connect=10s, sock_read=120s, sock_connect=10s) plus
TCPConnector(limit=300, limit_per_host=50, keepalive_timeout=30,
enable_cleanup_closed=True) replace the previous ad-hoc configuration.

Tenacity retry config is unchanged — it has no exception-type filter, so
aiohttp.ClientError and friends are caught on the same retry path that
previously caught httpx errors.
…3 pool

Removes the last direct httpx import from gigaevo/. The async-pool
deadlock documented in issue FusionBrainLab#9 is specific to httpx.AsyncClient over
httpcore; sync httpx.Client isn't the bug surface, but eliminating the
import keeps the codebase's network footprint consistent — aiohttp for
async paths, requests + urllib3 for sync paths.

New gigaevo/infra/requests_factory.py:
- make_requests_session(role, *, timeout, pool_connections, pool_maxsize,
  retry) returns a requests.Session backed by urllib3's connection pool.
- _TimeoutSession subclass injects a default per-request timeout because
  requests has no session-level timeout knob; callers can still override
  per-call.
- build_retry() exposes the urllib3.Retry policy with conservative
  defaults (total=3, backoff_factor=0.5, retry on 5xx, allow retry on
  HEAD/GET/OPTIONS/PUT/DELETE — POST excluded so a successfully-
  processed POST isn't duplicated by retry, raise_on_status=False so
  the client formats its own error message).

_ConceptApiClient:
- Uses make_requests_session("memory_api", timeout=...) instead of
  httpx.Client. ApiSync stays sync.
- requests.exceptions.ConnectionError / requests.exceptions.Timeout
  replace the previous httpx.ConnectError / httpx.TimeoutException
  branches with the same MemoryStorageError message text.

Tests:
- tests/memory/_fake_http.py replaces the httpx.MockTransport pattern:
  make_mocked_client(handler) builds a _ConceptApiClient whose
  _http.request is patched to dispatch into the test handler. The
  handler receives a small _FakeRequest with the same attribute surface
  (method / url / content / headers) the previous httpx-based handlers
  expected, so per-test handler code keeps working without rewriting.
- make_fake_response(status_code, *, json, text) builds a
  requests.Response with httpx.Response-compatible constructor shape.

Library footprint after this commit:
- aiohttp for async HTTP paths (via openai.DefaultAioHttpClient,
  introduced earlier in this branch)
- requests + urllib3 for sync HTTP paths (this commit)
- No `import httpx` remains in gigaevo/, problems/, or tests/.
  httpx stays as a transitive dependency through openai / langchain-
  openai / langfuse / litellm; it's gone from our own modules.
…acks

Adds gigaevo/infra/_net.py — single source of truth for the parts the
sync (requests) and async (aiohttp) factories must agree on:

- DEFAULT_SOCKET_OPTIONS: SO_KEEPALIVE plus per-OS probe timing
  (TCP_KEEPIDLE/INTVL/CNT on Linux; TCP_KEEPALIVE on macOS), shifted
  from the kernel default of net.ipv4.tcp_keepalive_time=7200 (two
  hours, useless for HTTP) down to IDLE=60s / INTVL=10s / CNT=6 — a
  dead peer is detected in ~120s rather than 2h. TCP_NODELAY=1 stays
  in the list so urllib3's default isn't lost when callers compose
  their own option list. apply_socket_options() applies each option
  best-effort and swallows individual OSErrors so kernel rejection of
  one tuning option never fails an otherwise-good connection.
- build_tls_context(): ssl.SSLContext pinned to TLS 1.2+ with
  hostname checking and CA verification on. Both libraries already
  default to similar contexts; this makes the properties explicit so
  they survive library upgrades and are visible in our code instead of
  implicit in ssl.create_default_context.

aiohttp_factory.py:
- _KeepaliveConnector subclasses aiohttp.TCPConnector. aiohttp doesn't
  expose socket_options on the connector — the only way to set per-
  connection SO_KEEPALIVE is to intercept transport creation. The
  subclass overrides _wrap_create_connection and applies the options
  to the underlying socket from transport.get_extra_info('socket')
  after the connection completes.
- Default keepalive_timeout=20s (was 30s) — short enough to age out
  before any common load balancer's idle timeout (AWS ALB 60s,
  NGINX 75s, CloudFlare 90s) closes us asymmetrically.
- Default limit_per_host=100 (was 50) — LLM endpoints saturate.
- Default ClientTimeout total=300s, sock_read=300s, connect=15s — LLM
  generations occasionally take several minutes; the previous 120s
  defaults were too tight.
- Default ssl_context = build_tls_context().

requests_factory.py:
- _KeepaliveHTTPAdapter subclasses HTTPAdapter to thread socket_options
  and ssl_context through init_poolmanager into the underlying
  PoolManager — required so the options reach every connection minted
  by the pool, not just the first.
- Default timeout split into (connect=15s, read=60s) tuple; a single
  float is normalized to (value, value) so legacy callsites continue
  working.
- Default Retry: separated connect / read / status counts (5 each),
  backoff_jitter=0.5 defeats synchronized retry storms,
  backoff_max=30s caps the worst-case sleep, status_forcelist now
  includes 408 and 425 along with 429 and 5xx. POST is excluded from
  allowed_methods (urllib3's safe default) — retrying a successfully-
  processed POST creates a duplicate resource.
- Default ssl_context = build_tls_context().
- Default pool_connections=20, pool_maxsize=100 (was 10, 50).

Test coverage:
- DEFAULT_SOCKET_OPTIONS contains SO_KEEPALIVE + TCP_NODELAY on every
  platform; TCP_KEEPIDLE on Linux.
- apply_socket_options swallows individual setsockopt failures.
- ssl context default pins TLSv1_2 + check_hostname + CERT_REQUIRED.
- _KeepaliveHTTPAdapter pushes socket_options and ssl_context through
  to the PoolManager's connection_pool_kw.
- _KeepaliveConnector is the concrete TCPConnector subclass returned
  by build_connector(); enable_cleanup_closed and ttl_dns_cache are
  forwarded from build_connector kwargs (asserted via construction
  spy because aiohttp 3.13 doesn't expose ttl_dns_cache publicly).
- Retry excludes POST; status_forcelist contains 408/429/503.
- _TimeoutSession.request forwards extra kwargs verbatim.

168/168 tests pass across tests/infra/ + tests/memory/. ruff clean.
…sites

Three call sites in the LLM stack constructed openai SDK clients with no
http_client kwarg, leaving each instance to mint its own internal
connection pool. After this commit they can opt into the aiohttp-backed
factory built earlier in the branch:

- BalancedChatOpenAI gains an explicit ``http_async_client`` constructor
  kwarg. When provided, it's forwarded to every per-endpoint ChatOpenAI
  via kwargs.setdefault, so all N endpoints share a single underlying
  connection pool. Default ``None`` preserves the historical behavior of
  letting each ChatOpenAI mint its own client — opt-in, not enforced.

- The ideas_tracker AsyncOpenAI client now passes
  http_client=make_openai_http_client("ideas_tracker_async") at
  construction. The sibling sync OpenAI client is intentionally left on
  the openai SDK default (sync httpx) — sync httpx is not subject to
  the documented async-pool deadlock and the openai SDK's sync interface
  is built around httpx.Client.

- MultiModelRouter._verify_models was using urllib.request.urlopen
  directly with no retry/keepalive/TLS configuration. Swapped to a
  session minted by make_requests_session("model_verify",
  timeout=(5.0, 10.0)). The probe now inherits the same urllib3 retry
  policy and TCP keepalive socket options as the rest of the sync HTTP
  layer.

Not touched in this commit:
- problems/dashboard/validate.py — single ChatOpenAI driven via the
  sync .invoke() path, runs once per dashboard score, not on a hot
  loop. Sync httpx default is appropriate here.
- gigaevo/memory/openai_inference.py — sync OpenAI() only, no async
  hot path.

Test coverage adds ``TestHttpAsyncClientForwarding`` to
test_balanced_chat.py — verifies an explicit http_async_client reaches
every per-endpoint ChatOpenAI, and the default ``None`` doesn't inject.
…ohttp + requests

Reflects the actual runtime footprint after the network layer rewrite:

- openai>=1.0.0 → openai[aiohttp]>=2.0.0. DefaultAioHttpClient lives in
  openai 2.x's aiohttp extra (which transitively pulls aiohttp and
  httpx-aiohttp). 2.x is required for the async migration to function.

- httpx>=0.27.0 is dropped from direct dependencies. No `import httpx`
  remains in gigaevo/, problems/, or tests/; httpx stays available as
  a transitive dep through openai / langchain-openai / langfuse /
  litellm but is no longer something this codebase imports directly.

- aiohttp>=3.10 added explicitly. It arrives transitively via the
  openai aiohttp extra, but the codebase now imports aiohttp directly
  (gigaevo/infra/aiohttp_factory.py + _net.py + concept_api callers
  via the openai SDK wrapping). Pinning a floor makes the dependency
  graph honest.

- requests>=2.31 added explicitly. _ConceptApiClient (the Memory API
  client) is now requests-based via gigaevo/infra/requests_factory.py;
  this codifies it as a direct dependency rather than relying on
  transitive presence through assorted other packages.
Several small failure-mode fixes around the aiohttp + requests
factories that came out of a self-review pass.

aiohttp_factory: ``make_openai_http_client`` was constructing the
openai SDK's ``DefaultAioHttpClient`` with no timeout override.  The
SDK default is ``httpx.Timeout(timeout=600, connect=5.0)`` — a 10
minute total per request, easy to hang a coroutine on a stuck
endpoint.  The factory now defaults to ``httpx.Timeout(timeout=300,
connect=15)`` to match the aiohttp session factory's bounded budget,
and accepts a per-call ``timeout=`` httpx.Timeout override.

requests_factory: ``_TimeoutSession.request`` used ``setdefault`` to
inject the default timeout, which meant an explicit ``timeout=None``
bypassed the default — and ``requests`` interprets ``None`` as
"wait forever".  Treat ``None`` and missing as the same: always
apply the session default unless a positive value is passed.

aiohttp_factory: ``enable_cleanup_closed=True`` was passed
unconditionally.  aiohttp 3.13 emits a noisy warning on CPython
3.12.13+ because the upstream SSL leak it defended against was fixed
in cpython PR #118960.  Pass the kwarg only on older runtimes.

Both factories now set a ``User-Agent: gigaevo-core/<version>``
header on every session so upstream WAFs / rate-limiters can apply
per-client policies instead of bucketing our traffic with anonymous
``python-requests`` / ``aiohttp`` defaults.  Caller-supplied headers
merge and win on collision.

concept_api: ``MemoryStorageError`` raised on ConnectionError /
Timeout now includes ``str(exc)`` (DNS failure vs ECONNREFUSED vs
SSL verify error is in the original exception, hardcoded message
was discarding that diagnostic).  HTTP error responses no longer
splice ``response.text`` raw into the exception message: a new
``_safe_response_preview`` strips ANSI / CR / LF / NUL control
bytes and truncates to 512 chars to avoid log-sink injection from a
misbehaving or attacker-controlled server.

_net: ``build_tls_context`` is now ``lru_cache(maxsize=1)`` — the
SSLContext parses the CA bundle, and the returned context is
treated as read-only by every caller in this package, so building
it once per process is enough.  New ``user_agent()`` helper for the
new headers.

Tests cover all eight changes plus the existing happy paths.
build_retry_for_idempotent_post: new opt-in Retry policy that
includes POST in allowed_methods.  Callers whose server-side POST is
idempotent — chat completions on any OpenAI-compatible server are
the canonical case, since the server holds no state about prior
requests — get transient-failure retry coverage on POST.  The
conservative build_retry stays the default for resource-creating
POSTs (POST /v1/memory-cards et al.) where retrying would duplicate
side effects.

_verify_models thundering herd + missing auth: ``_verify_models`` ran
synchronously from MultiModelRouter.__init__ for every router, with
no shared state, no jitter, and no Authorization header.  Five
routers per driver, N drivers, all hitting GET {base_url}/models at
the same moment was a textbook boot storm.  Authenticated providers
(openai.com, OpenRouter) returned 401 silently because no API key
was forwarded.

Now the probe routes through ``_fetch_available_models_at`` which:

- Caches successful probe results per ``base_url`` for 5 minutes;
  multiple routers in the same process share the result.
- Caches failures for 30 s so retries are still possible without
  re-incurring the jitter sleep on every router instantiation.
- Sleeps ``random.uniform(0, 2.0)`` before each uncached probe so
  concurrent driver processes don't fire synchronously.
- Sends ``Authorization: Bearer <api_key>`` resolved from the
  per-model ``openai_api_key`` ``SecretStr``.
- Filters non-dict / id-less entries from ``/models`` responses so a
  misbehaving server doesn't crash the probe.

``_verify_models`` itself now short-circuits on non-string
``base_url`` values — the loop previously walked into the network
path with MagicMock-shaped URLs from unit tests and tripped warning
logs every router construction.

346 tests pass across tests/llm tests/infra tests/memory; ruff clean.
Three follow-ups from the self-review audit of the previous two
commits.

_safe_response_preview now strips ANSI / CSI escape sequences as a
unit via regex before the per-character control filter.  The earlier
per-char filter only dropped the ESC (0x1B) byte and left the CSI
parameter bytes (``[31m``, ``[0;1;33;45m`` …) intact as readable
text.  An attacker-controlled response prefixed with ~1 KB of
``\x1b[31m`` would survive the strip as 1 KB of ``[31m[31m…``,
filling the 512-char preview budget and dropping the legitimate
trailing error message.  The added test covers this exactly.

Stripping now runs before truncation: if the readable error
message is in the tail of a 4 KB response with a control-byte burst
in the head, the strip compresses the head first so the tail
survives the preview budget.

NBSP (U+00A0) and the rest of the printable-Unicode range are now
preserved: the condition flipped from ``ord(ch) > 0xA0`` to
``ord(ch) >= 0xA0``.  Without this a Memory API response with
non-Latin text or NBSP got mangled.

build_retry_for_idempotent_post now accepts an ``allowed_methods=``
override, matching ``build_retry`` for callers that need a narrower
set than the default HEAD/GET/OPTIONS/PUT/DELETE/POST.

350 tests pass across tests/llm tests/infra tests/memory; ruff clean.
@GrigoryEvko

Copy link
Copy Markdown
Author

Conflict map vs PR #10 (fix/llm-output-sanitization)

File Hunks Type
gigaevo/llm/models.py 1 (around _verify_models) This PR rewrites the verification loop (process-wide TTL cache + jitter + auth header + helper extraction); PR #10 adds sanitize_for_log around the Model NOT FOUND warning line

Resolution (whichever PR merges second):

Take this PR's restructured loop (_fetch_available_models_at helper, _model_base_url / _model_api_key extracted, for m in self.models: if _model_base_url(m) != base_url: continue inner loop), and re-apply PR #10's sanitize_for_log wrapping inside the warning interpolation:

logger.warning(
    "[MultiModelRouter:{}] Model {} NOT FOUND on {}. Available: {}",
    self._name,
    sanitize_for_log(m.model_name),
    base_url,
    sorted(available),
)

The conflict is intrinsic — this PR restructures the function while PR #10 sanitises the warning args within it. Merge order is interchangeable; the second-merged PR needs the one-line union shown above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Стабильность асинхронного httpx при длительной работе

5 participants